Skip to content

[BugFix]fix Qwen3 MoE call gate twice#40664

Merged
robertgshaw2-redhat merged 1 commit intovllm-project:mainfrom
jikunshang:kunshang/moe_gate
Apr 23, 2026
Merged

[BugFix]fix Qwen3 MoE call gate twice#40664
robertgshaw2-redhat merged 1 commit intovllm-project:mainfrom
jikunshang:kunshang/moe_gate

Conversation

@jikunshang
Copy link
Copy Markdown
Collaborator

@jikunshang jikunshang commented Apr 23, 2026

Purpose

Qwen3 MoE model will call gate gemm twice, we find this in xpu kernel profiling. thanks @zufangzhu raise this.
see discussion here #35326 (comment)
This PR just follow deepseek_v2.py and other modeling file use is_internal_router
Moreover, there may be some dead code, will address in follow up PRs.

cc @robertgshaw2-redhat @bnellnm

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added qwen Related to Qwen models bug Something isn't working labels Apr 23, 2026
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) April 23, 2026 01:08
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 23, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the forward method in vllm/model_executor/models/qwen3_moe.py to support internal routing within the FusedMoE class. It introduces a conditional check to determine whether to use an internal router or an external gate for computing router logits, providing flexibility for different MoE implementations. I have no feedback to provide as there were no review comments.

@robertgshaw2-redhat robertgshaw2-redhat merged commit 342c58b into vllm-project:main Apr 23, 2026
62 checks passed
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Adrian <info@zzit.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants